Shifted COCG method and its application to double orbital extended Hubbard model 



00 
O 

o 

(N 
X> 

PL, 

O 

(N 



•4— > 

CO 



i 

c 

o 
o 



> 
o 

On 

(N 

(N 
O 
00 

O 



Susumu Yamamoto, 1 Tomohiro Sogabe, 2 Takeo Hoshi, 3, 1 Shao-Liang Zhang, 2 and Takeo Fujiwara 4,1 

1 Core Research for Evolutional Science and Technology, 
Japan Science and Technology Corporation (CREST-JST), Japan 

2 Department of Computational Science and Engineering, 
Nagoya University, Furo-cho, Chikusa-ku,Nagoya 464-8603, Japan 

'Department of Applied Mathematics and Physics, 
Tottori University, 4-101 Koyama-Minami, Tottori, 680-8550 Japan 
^Center for Research and Development of Higher Education, 
The University of Tokyo, Tokyo 113-8656, Japan 
(Dated: February 21, 2008) 

We explains the shifted COCG method which can solve a series of the linear equations generated 
by numbers of scaler shifts, without time consuming matrix- vector operations, except at the only one 
reference energy. This is a family of the CG method and sharing the robustness and the capability 
of the accuracy estimation. Then shifted COCG is quite useful to calculate the Green's function of 
the many-electron Hamiltonian which have very large dimension. We applied it to the double orbital 
extended Hubbard model with twelve electrons on the periodic v8 x v8 site system, the dimension 
of the Hamiltonian equals to 64,128,064, and found the ground state is insulator. We also explained 
the crucial points of the shifted COCG algorithm for reducing the amount of required memory. 

PACS numbers: 71.15.Dx,71.27.+a,71.10.Fd,02.60.Dc 



I. INTRODUCTION 



Strongly interacting systems attract considerable at- 
tention because of fruitful phenomena in a new field of 
cross- correlation physics 1 - and their potential applicabil- 
ity to developing field of spintronics. Theoretical study of 
strongly correlated systems, e.g. many-electron systems 
and interacting spin systems, becomes time-consuming 
and more difficult when one starts numerical investiga- 
tion of larger systems. 

One reason of this difficulty is, of course, the large 
dimension of the Hilbert space or the Hamiltonian ma- 
trix of many-electron systems. The dimension of the 
Hilbert space grows exponentially with increasing num- 
ber of atoms linearly in a many-electron system and, on 
the contrary, that in a one electron problem (or the den- 
sity functional theory, DFT) the size of the Hamiltonian 
matrix is proportional to the number of atoms. The 
second reason is the fact that the rigorousness or accu- 
racy control becomes seriously difficult in a problem of 
the large Hamiltonian matrix. Because the width of the 
spectra is in proportion to the number of atoms in many 
cases, the energy interval between adjacent eigenenergies 
becomes small quite rapidly with increasing number of 
atoms. The short interval between adjacent eigenener- 
gies causes the difficulty in separating of respective eigen- 
vectors. Then, for example, it is very important to ob- 
tain the precise ground state, from which all the physical 
quantities are derived in the (zero-temperature) many- 
electron theory. Thus, one needs higher energy resolution 
with increasing number of atoms, but, sometimes, we do 
not have fast, reliable and stable calculation algorithm 
for large Hamiltonian matrices. 

Our main target is the calculation of the Green's func- 



tion matrix G(u) in many-electron problems; 

G ij (u>) = [(cj + ir,-H)- 1 ] ij , (1) 

where H and ui are a real Hamiltonian matrix and energy 
parameter, respectively. The suffices i and j denote arbi- 
trary state such as c\ |) or c] |), where c is an annihilation 
operator and |) is a ground state. Here, we should use a 
positive finite parameter 77 in a numerical calculation of 
a finite system, instead of infinitesimally small positive 
number. The spectral function is the important physical 
quantity derived from the Green's function Eq. {1} . 

There are two possibilities for calculating Eq. dXJ) . One 
is to solve the eigenvalue problem with an eigenvalue w, 
e.g. the Lanczos method. The other is to solve following 
linear equation and to take inner product between the 
solution and vector \i), 



. def 

A = lo + 177 
A\x 3 ) = \j), 



b 3/ ' 



(2) 
(3) 
(4) 



with an arbitrary energy parameter to, e.g. the shifted 
COCG (conjugate-orthogonal-conjugate-gradient) 
method, a family of the CG (conjugate-gradient) 
method. In both cases, we first restrict the space 
dimension of states to be finite. In other words, we 
assume the size of the Hamiltonian matrix to be finite. 
Then we construct the Krylov subspace defined as 

£n(A li» = W \j),A\j),A 2 \j},...,A n \j)}. (5) 

In the Lanczos method, orthogonalized base vec- 
tors (Lanczos vectors) are successively generated in 
IC n (A, \j)), and at the same time, the Hamiltonian ma- 
trix is tridiagonalized. In a large scale calculation, one 
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can only use a small Krylov subspace, because of heavy 
load of computation and a corruption of the orthogonal- 
ity of generated basis vectors. It is well known that the 
rounding error breaks down the orthogonality of the gen- 
erated base vectors rapidly, when the dimension of the 
Krylov subspace exceeds several tens. The corruption of 
the orthogonality causes spurious eigenvalues and, more 
seriously, incorrect eigenvectors. Therefore, the size of 
the Krylov subspace should be limited usually to some 
tens or a hundred. 

We developed the shifted COCG method, where the 
Eq. (U|) is solved within the Krylov subspace, and ap- 
plied it to the one-electron tight-binding Hamiltonian in 
the system with a large number of atoms^ A set of or- 
thogonal base is created by the iterative process of the 
shifted COCG method, like Lanczos process, but the cal- 
culation is stable for large dimension of the Krylov sub- 
space, in contrast to the Lanczos method. We must solve 
the Eq. (J4|), for every scalar shift a of A correspond- 
ing to respective energy mesh point. The number of 
the cr's is as much as O(10 2 )~O(10 4 ) generally, however, 
the most time-consuming matrix-vector operations are 
needed only at a single reference energy (a = 0). Then 
the order of the total amount of calculation is just the 
same as Lanczos method. The reduction of the matrix- 
vector operation at non-zero a are based on the fact that 
a power of (A + a) is decomposed into a linear combina- 
tion of powers of A. Thus, Krylov subspace is invariant 
JC n (A, \j)) = K n {A + a, \j)) against a. 

In the application of this method to the many-electron 
theory, because the dimension of the vectors is huge, we 
must take care for the total amount of base vector storage 
for K n (A, in order to satisfy the memory constraint 
in modern computers. We explain the innermost loop 
index should be the iteration step n, for an extremely 
large size of the Hamiltonian matrix. This structure also 
give us following additional two merits. One is that a part 
of the program code can be used in the inverse iteration 
process to improve the ground state. Another is that the 
calculation with the different rj can be done without time 
consuming matrix- vector operations. 

The structure of the paper is as follows. In Sec. HH the 
basics of the shifted COCG method is explained briefly. 
Section IIIII is devoted to explanation of how to obtain 
global convergence. Then we apply the shifted COCG 
method to an extended Hubbard Hamiltonian with or- 
bital degeneracy and intra- and inter-site Coulomb inter- 
actions in Sec. HVi where the size of Hamiltonian matrix 
is equal to 64,128,064. We calculate one-electron excita- 
tion spectra and evaluate the insulating gap. In Sec. [Vj 
we will conclude that the essential difficulties of numeri- 
cal investigation of many-electron problems, the accuracy 
control (or monitoring) and the robustness are achieved 
by the present method, within the moderate amount of 
memory space. We explain the two points to understand 
the mathematics in the back ground of the shifted COCG 
method in Appendix [A] The practical design of storing 
the huge Hamiltonian matrix is discussed in Appendix IB"] 



II. SHIFTED COCG METHOD 

Assuming that the Hamiltonian is represented by using 
iV-dimensional real matrix H and A is a complex sym- 
metric matrix w ro f + i^ lc f — H, we should solve the linear 
simultaneous equation of 



Ax 



and its shifted equation 



(6) 



(7) 



where a = (w+ir)) — (w re f +W7ref)- We represent quantities 
q in the shifted system as q a . The right hand side b 
represents \j) in Eq. ([3]). We assume that the vector b is 
a real and normalized. 

In the family of CG method, here the shifted COCG 
method, it is important that the approximate solution of 
Eq. ^ is searched within the Krylov subspace K n {A, b). 
The subspace K. n {A, b) becomes whole space at n = N — 
1, and the solution becomes exact. 

The accuracy of the approximate solution at n-th iter- 
ation x n is evaluated by using the residual vector, 



v n b Ax ni 



(8) 



and the iteration is stopped as soon as the norm of the 
residual vector, ||t*„||, satisfies the criterion for the con- 
vergence. 

The residual vectors are "orthogonalized" with re- 
spect to the non-standard "inner-product" (u, v) — u T v. 
When r\ = 0, all the relevant vectors are real and the 
"inner-product" and "orthogonality" reduce to standard 
ones, respectively. Because r„'s are "orthogonalized", it 
is convenient to use them as base vectors of JC n (A, b) . In 
addition to that, owning to the "orthogonality", we ob- 
tain the important theorem of "collinear residua? (See 
appendix [A} . 



A. COCG method 

The shifted COCG method starts from the COCG 
method- solving Eq. ©. We define x n , p n and r n as 
the approximate solution at n-th iteration, the searching 
direction to the approximate solution at the next itera- 
tion, and the residual vector, respectively. At a reference 
energy, we must solve the following equations under the 
initial conditions, Xq = p_i = 0, rg — b, a_i = 1, and 
0-1 = 0: 



r„ 

Pn 

Ot n -l 

Pn-1 



Xn-1 + Ctn-lPn-1, 
r„-\ - On-iApn-t, 

r n + Pn-lPn-l, 

(r ra _i,T- n _i) 
(Pn-uApn-l)' 

(r„_i, r n _i) ' 



(9) 
(10) 

(11) 
(12) 

(13) 
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Here, we must notice the fact, in the procedure of itera- 
tion, (v,v) = can happen though v + 1 0£ This cannot 
happen in the CG method (?7 r ef = 0, the matrix A is 
positive definite) and the other part is perfectly identical 
to the CG method. A set of residual vector r n forms 
the "orthogonalized" base. This "orthogonality" is very 
important for us to understand the theorem of collinear 
residual. We explain it in detail in Appendix [A"l 

We can choose an alternative set of the recurrence 
equations, as follows. Eliminating p's from Eqs. (fTU]) and 
we obtain the recurrence equation of r n , 

r n +i = [ 1 + — ot n A ) r n r n _i. (14) 



OL n -\ 



O-n-X 



Taking "inner product" between r n and the Eq. (|14[) . we 
obtain 



(r„,r n ) 



( r n , Ar n ) — a "_* (v n , r n ) 



(15) 



Then the Eqs. ([15]). l[I5j) and HH) can produce all the 
base vectors, rfe's (k > n), when a n _i, r n _i and r„ are 
supplied. 



B. Shifted equations 

The key to the reduction of the matrix-vector opera- 
tions in solving the shifted system Eq. ([7]), is the theorem 
of collinear residual: 



1 



r = 



(16) 



where the tt (T is a scalar function (actually polynomial) of 
a. Then, once {r„} are given, the base set {r^}„ for the 
arbitrarily shifted system can be obtained by using scalar 
multiplication. We obtain the recurrence equations that 
determines 7r£,a£,/3£,a;£, and p£, from Eqs. (|9"j)^ (Tni)) . 
with replacing A by A + a, with the same initial condi- 
tions: 



<+i = 1 + 



Pn-ia n 
a n -i 



OL n (J VT„ 7T, 



n— 1 ' 



l n+l 



/3 CT = 



'n+1 . 

x n = "^n— 1 "I" a n— iPn— li 



(17) 
(18) 

(19) 
(20) 
(21) 



These recurrence equations can be solved without 
time consuming matrix-vector operation. In addi- 
tion to that, each component of the vector Eqs. (fT7|) ^ (T5T|) 
can be solved separately, due to the absence of the matrix 
operation. 



Crucial remarks for extremely large matrix to 
save required memory space 



For the solution of the relatively small matrix (Dim < 
10 4 ), any loop structure of shifted COCG method can be 
applicable. However, in the many-electron theory, the di- 
mension of the intermediate vectors is huge and then the 
number of intermediate vectors is restricted to some tens 
or hundreds. In the standard loop structure which From- 
mer showed^ the outermost loop index is the iteration 
step n, and, all the vectors p£, and x° for every energy 
mesh point a, are required in order to start the calcula- 
tion at the iteration step n + Then all the energy mesh 
points must be fixed before the calculation starts. Be- 
cause the calculation at the respective energy mesh points 
are independent to each other, the loop structure can be 
transformed such that the reference system is solved with 
the COCG method storing the {«„}„,{/?„}„ and {r„}„, 
then the shifted systems are solved with stored informa- 
tion about reference system for each energy mesh points. 
Here the innermost loop index is the iteration step n. 
Since the number of energy mesh points is larger than 
the number of iteration generally, the latter transformed 
loop structure requires smaller memory than the original 
one. In addition to that, we need not to prepare energy 
mesh points {a} because all the required information re- 
lated to the reference system are stored in the COCG 
process, the preceded part of the algorithm. Then, for 
example, we can change the smearing factor r\ freely with- 
out repeating COCG process that includes matrix-vector 
operations. 

Further reduction of the required memory is possible, 
with further modification of the recurrence equations for 
the shifted system. Assuming that a real constant vec- 
tor c is an adjoint vector and taking the inner product 
between c and the Eqs. (|20p and (|2"Tj) , we obtain a set of 
self-contained equations for determining the (c, x n ), n-th 
approximate solution of the element of Green's function, 
due to the absence of matrix- vector operations. 



In the applications in Sec. HVi we are interested in the 
case where c = b in order to calculate the trace of the 
Green's function. Therefore, we need to store (6, r„), a n , 
j3 n and ||r n ||, in the COCG part, and later, solve only 
the 6-component of Eqs. (|2U)l and (f2"Tj) . Here, the norm of 
the residual vector \\r%\\ = T^r||r n || is not necessary to 
solve the recurrence equations but is used to monitor the 
convergence of the approximate solution. Additionally, 
we store the full components of the last two r„'s in the 
COCG part, in order to extends the iteration number in 
the seed switching part (subsection IIII Bp . 

Even if the full components of the Green's function are 
needed, we need to store just a few components of r n , 
because the suffix of the Green's function denotes the 
one-electron orbitals, the number of which is very small 
compared to the dimension of many-electron Hamilto- 
nian H . 
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D. Preparation of ground state wavefunction 

The transformation of the loop structure in the sub- 
section lll Cl increase the re-usability of the program code. 
The COCG part of the code can also be used in the pro- 
cess to improve the ground state wavefunction as follows. 

First we use the Lanczos method in order to tridiago- 
nalize the Krylov subspace, then, obtain the ground state 
diagonalizing it. The calculated lowest eigenenergy con- 
verges rapidly with increase of the dimension of the sub- 
space, but the wavefunction does not, due to the unsta- 
ble orthogonality against the inevitable rounding error. 
Next we improve the approximate eigenenergy and the 
wavefunction with the inverse iteration method. Because 
the COCG process with the real arithmetics is the same 
as CG process, here we can use the COCG part of the 
shifted COCG algorithm whose loop structure is changed 
as in the subsection III C^ 

Since the inverse iteration method works only when 
the approximate eigenvalue and eigenvector are given, 
the first Lanczos process can not be omitted. If the ac- 
curacy of the calculated wavefunction is not enough, the 
processes are repeated with replacing the initial Lanczos 
vector by the latest approximate wavefunction. 

III. ACCURACY AND SEED SWITCHING 

A. Estimating accuracy of Green's function 

In this subsection, we explain the accuracy of the 
Green's function calculated by the shifted COCG method 
and give its estimation. Here G oxac t is the "exact" solu- 
tion of Eqs. (O and (|3|) for a given finite value of r\. 
Then we say that "the calculated Green's function is ac- 
curate", when the I {GsC ° \ g~1\T^ I ( hereaiter > "accu- 
racy") is small. The "accuracy" and ||r„|| are generally 
"truncation error" of G and x n , respectively. Because 
the shifted system is equivalent to the reference system, 
then we can estimate the accuracy for the shifted system, 
with replacing A by A + a and any other quantities {q} 

i'.v {<r\. 

We can derive following equation from Eqs. ©>© and 



[GsCOCG — Ccxactljj 




(b,A~ 




[G C xact\ jj 




(b,A 





(22) 



If the matrix A was positive definite real symmetric ma- 
trix and the vector b and r n were real vectors, the up- 
per bound of the right hand side of Eq. (|22"|) is equal to 

||b|| = H r «ll- 

When the matrix A can be fully diagonalized numeri- 
cally, we can estimate G eX act within rounding errors, and 
then, obtain the "accuracy" of the approximate Green's 
function calculated by shifted COCG method. 

The dotted and solid line in the Fig. Q] show the "accu- 
racy" of [G s cocg]j\j and the norm of the residual vector 



FIG. 1: An example of the "accuracy" of [GscocgIjj, \\ r n\\ 
and the imaginary part of the "exact" solution Im[G e xact]jj 
(See text). The dimension of the matrix A here is equal to 
8,960, in the same model in Sec. lIVI The reference energy and 
smearing factor are equal to lo tc { — 6.04eV and r\ — 0.05eV . 
The iteration number is equal to 800. 



FIG. 2: The seed (w ro f + i^rcf) switching and the norm of 
residual vector ||r„|| at respective seed (the black solid line). 
Here the 7; ro f is a constant and equal to 0.05eV. The dashed 
vertical line indicates the iteration number where the seed is 
switched. The gray lines show the ||r^ max ||'s (see text). The 
calculated values of ||r n || at uj la { — 8.95eV are multiplied by 
the factor of 100 in order to avoid overlap. 



||r n ||, respectively. The dashed line shows the excitation 
spectra. The figure shows that the "accuracy" is bounded 
by the norm of the residual vector. Therefore, we can es- 
timate the "accuracy" of the calculated Green's function 
by using ||r n ||, without the knowledge of the "exact" so- 
lution Goxact- We can also see from the figure that the 
Green's function calculated by the shifted COCG method 
is accurate more, near the bounds of the spectra. 



B. Seed switching 

Assuming that the approximate solution of the refer- 
ence system, Eqs. ©^(HSl), converges at M-th iteration, 
we can solve the shifted system, Eqs. (frT)) ^ (f2"T]) , up to 
the same M-th iteration. When the approximate so- 
lution of the shifted system does not converges at any 
lu + ir/ = to re { + ir] le i + £J, we should extend the iteration 
of the reference system. However at u> ro f-|-i?7 ro f , the exten- 
sion does not improve the approximate solution, since the 
norm of the residual vector is considerably small already. 
In that occasion, we should change the seed (w re f + h?ref ) 
to a new one, w" c ° f w + h7" e e f w , where the norm of the resid- 
ual vector is large and the approximate solution does not 
converge. Because the shifted system is equivalent to the 
reference system, we can change the seed as follows, with- 
out disposing the previous calculation at the old w re f £ 

We define er max so that r^" ax = Max CT {r^}, where 
Max CT means the maximum value on the er-mesh (en- 
ergy mesh) points. Because the w ro f + i77 rc f + er max is 
the prime candidate for the energy of the slowest con- 
vergence, we choose it as the w" e c f w + h?"™- Then, 
aj m ",/Jj°", and r£ max (k = 0, 1, • • • , M) are calculated 
and replaces the old values at old reference energy, re- 
spectively. Finally, the recurrence Eqs. (|9j)^ (fT3|) of the 
COCG method at u^ e e f w + irr^f are calculated until the 
solution converges at M now -th iteration. The important 
point of the seed switching is that we can recalculate new 
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a n 's, PnS, and H^nll's (0 < n < M) without any matrix- 
vector operation, though the matrix-vector operation is 
required to calculate the new ones at further n'-th itera- 
tion (M < ri < M ncw ). 

The same remark as the subsection III CI is applicable 
to the implementation of the seed switching. The shifted 
COCG algorithm with seed switching of any loop struc- 
ture can be applicable to relatively small matrices, but, 
we must change the loop structure of it from the previ- 
ous one£, since the size of the intermediate vector is huge 
in the many-electron theory. We must even change the 
recurrence equation of r n to Eq. (fH|) , so that the r„ +1 
is calculated only with the r ra _i and r n , in stead of all 
{r n } n= i,2,...,n, for extremely huge matrices. When we 
store {a n ,/3n}n=0,i,...,M-i, tm-i and tm in the COCG 
process, then we can calculate ol^\, r1f^\ and r^} nax , 
which are required for the following calculations. Then 
we calculate the new reference system up to Af now -th 
iteration step, by using Eqs. (fTB")) . (fT5|) and (fT4"|) . 

Figure [5] shows the example of the seed switching. The 
system is the same one as in Fig. [T] except w re f. Here the 
f] = Vrcf = 0.05eV. At the uj tc { = — 1.40eV, ||r„|| decrease 
exponentially, and satisfies the criterion ||r n || < 10~ 10 at 
iteration step n = 330. However, there are many energies 
where the converging speed of the residual vector \\r°\\ 
is slower than that at ui rc {. Then the tr max = 7.43eV is 
searched and the w" e e f w is shifted to be 6.03eV. We need 
seed switching twice more at 1509- and 3297-th iteration 
in order to obtain the global convergence. The largest 
value of the "accuracy" is 1.5 x 10~ 10 at the last iteration 
step n = 3455. 



C. Robustness of shifted COCG method 

In this subsection, we explain the robustness of the 
shifted COCG method, which is very important to ob- 
tain the converged approximate solution, especially in the 
case of the long iteration. We say that the calculation is 
robust, when the calculation is stable against the pertur- 
bation. For examples, the orthogonality of {r n } is not 
a robust property, because the inevitable rounding error 
perturb the calculation and the orthogonality is broken 
down quickly. 

The robustness of the shifted COCG method consists 
of two parts. One is the robustness of COCG method at 
the reference energy uj rc f . And the other is the robustness 
of the iterative solution of the shifted equations. Figure [2] 
shows the robustness of the COCG method at w rc f, be- 
cause the norm of the residual vector Hr^H goes to in 
spite of long iteration 3,540. The global convergence of 
the "accuracy" that is mentioned at the end of the sub- 
section llll Bl shows the robustness of the iterative solution 
of the shifted equations. In the shifted COCG method, 
the "orthogonality" of base vectors {r n } is not necessary 
for reducing ||r n ||, in contrast to the fact that the sub- 
space diagonalization methods requires the unitarity of 
the base vectors. 



IV. APPLICATION OF THE SHIFTED COCG 
METHOD TO THE MANY-ELECTRON 
PROBLEM 

Here we apply the shifted COCG method, to the dou- 
ble orbital extended Hubbard Hamiltonian and calculate 
the excitation spectral 



A. Hamiltonian of La3 Sri Ni04 

2 2 

The experimental results show that the layered per- 

ovskite La3SriNi04 is an insulator with charge and spin 

2 2 | | 

stripe order, as depicted in Fig. [3] The charge and spin 

structures of the single layer of La | Sr i N1O4 (pseudo two- 
dimensional system), choosing Ni 3d e g orbitals as rele- 
vant ones, was studied with the extended Hubbard model 
recently^ Here we use the same Hamiltonian. 

+ U^n ial n lal + (U - 2 J) ^ n h3z 2_ l a n hx 2_ y 2 a , 
J /„f „•)■»» »f „■)■„» \ 

+ "2 2-^1 \ C ia<y C i0<7' Cia < J ' Ci l 3<J "r" C ia<7 C ia(T' C i0<7' C il3o- J 
a, a' 

+ V hiavfljpa', (23) 

(id), a, 

where the suffix {i,j} denote the site, {a, (3} denote the 
orbital 3z 2 — 1 or x 2 — y 2 , and {cr, a'} denote the spin 
co-ordinate. The annihilation and number operator are c 
and h, respectively. The symbol t, e, U , J, V denote the 
Slater-Koster type hopping parameter, single electron en- 
ergy, on-site Coulomb interaction, on-site exchange inter- 
action, intersite Coulomb interaction, respectively. Hop- 
ping parameters are finite for nearest neighbor (n.n.) and 
second n.n. pair of sites. The braces (• • •) means that 
two sites enclosed by them are the n.n. sites. 

Though the anisotropy of the hopping parameters for 
the second n.n. pair stabilizes the spin structure^ we 
choose the isotropic (tetragonal) parameter set shown in 
Table fl] The role of U and J in the present situation 
is stabilization of integral valency of Ni ions (Ni 3+ and 
Ni 2+ ) and spin polarization. 

tddu tddS jt'dda + jt'ddi t' ddw A U J V 

-0.543 0.058 -0.018 -0.023 0.97 7.5 0.88 0.5 

TABLE I: The values of parameters in the Hamiltonian in 
unit of eV.— The prime symbol at the right shoulder of "t" 
means the second n.n. hopping. 
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FIG. 3: Experimentally observed charge and spin order in the 
La3SriNi0 4 .- 



B. Ground state of La3SriNi04 

2 2 

Here we summarize the properties of the calculated 

around state of the La3 Sr i MO4. 8 The calculated around 

22 

state shows the charge and spin stripe order consistent 
with experimental observation and the system is insu- 
lator. Diagonal hole stripes are separately localized on 
Ni 3+ site in order to reduce hole-hole interaction energy 
induced by inter-site Coulomb interaction V. Charge or- 
der and the inter-site Coulomb interaction V arc directly 
related to the energy gap in the excitation spectra of the 
system. Spin stripe occurs only under the condition of 
the existence of multi-orbitals and the charge order with 
a help of anisotropy. The spin stripe is determined by 
the electronic structure with smaller energy scale than 
that of the charge stripe. 



C. Computational details 

Here we explain miscellaneous computational details. 

The calculated system is two-dimensional square lat- 
tice. There are 12 electrons on the periodic y/8 x \/8 
sites. Because the total S z of the system is preserved, we 
can reduce the number of relevant many-electron states 
to 64,128,064, by using the condition S z = 0. 

The smearing factor 77 is also an arbitrary parameter 
in the present paper. Here we explain how we chose the 
value of 77. The energy scale of the low energy excita- 
tions is t ~ V ~ O(10 _1 eV'), because the value of on- 
site Coulomb interaction U is much larger. Therefore we 
must set 77 lower than 10 _1 eV, so that 77 does not smear 
out the finer structure of the spectra than itself. There is 
another restriction that the interval of the energy mesh 
is small enough than 77, in order to see the fine peak 
structure of the spectra. Then, because the calculation 
time increases with decreasing 77, the value of 77 is roughly 
determined as 0(lCT 3 eV)~0(lCr 2 eV). 

Next point is the criterion for the convergence of the 
ground state vector. Our calculations are of the double 
precision and the rounding error is inevitable (~ 10 -16 ) 
in the each component of the eigenvector. Assuming the 
accumulated error is of 0(VN) (N =64,128,064), the ac- 
curacy is expected to be 10 -16 x \/]V ~ 10 -12 . We set the 
allowance for the estimation by factor of 10 2 , and the cri- 
terion for the accuracy of calculated ground state energy 

E gs and eigenvector |) is J (\ (H - E gs ) 2 |) < 10~ 10 . 



FIG. 4: The spectral functions of the present Hamiltonian 
of double orbital extended Hubbard model with 12 electrons 
on the periodic \/8 x v8 sites (gray line) and their "accu- 
racy" (black line, see text). The spectra of affinity and ion- 
ization levels are calculated separately by using the shifted 
COCG method, for the given smearing factor rj — O.OleV 
(upper panel) and r\ — O.lOeV (lower panel). The highest 
occupied level is at 9.4eV and lowest unoccupied 10.3eV. In- 
tersite Coulomb interaction V = 0.5eV. Energy zeroth is set 
at the ground state energy 36.755eV of 12 electron system. 



D. Spectral function 



We examined the spectral function 



1, 



A{uj) = Tr [ImG(w)] 

7T 



(24) 



This can be easily evaluated by the shifted COCG 
method. Figure [4] shows the spectral functions of the 
state D at V = 0.5eV. The upper and the lower panel 
show the case of 77 = O.leV and 77 = O.OleV, respectively. 
Both of them are calculated from the same COCG calcu- 
lations and the only difference between them is the imag- 
inary part of the energy shift a. The spectra of ionization 
and affinity levels are calculated separately and the ui rc f 's 
for respective spectra are chosen to be (9.0 + i0.01)eV and 
(10.4 + i0.01)eV. The highest occupied level is at 9.4eV 
and lowest unoccupied 10.3eV. The number of iterations 
equals to 800 for each spectra. If one attempt to obtain 
the profile of the spectra with smoothly connected curves 
as is in the bulk limit, one should set the value of 77 suffi- 

,1 , ,1 (width of spectra) . , 

cicntly larger than -V- —. l - — c — '- , m order to smear 

J (iteration number) ' 

out the excessive peaks caused by the finite system. Be- 
cause iteration number equals to 800 in the present calcu- 
lations, this criterion becomes 77 3> 0.03eV, and the gray 
curves in the upper panel of Fig. [?] shows the smooth 
profile of the spectral function. If one attempt to see 
whether the energy gap opens at the boundary between 
affinity and ionization levels, one must choose sufficiently 
smaller 77 than the width of energy gap. In the present 
calculation, this criterion becomes 77 <C 0.9eV, and, the 
gray curves in the lower panel of Fig. Ushow the energy 
gap around ui = 9.8eV. We can choose 77 independent 
with the reference energy, then the energetic resolution 
of the spectral function can be changed after all the time 
consuming matrix- vector operations have been finished. 

The black curves in the upper and lower panels of 
the Fig. [4] show the "accuracy" of the respective spec- 
tral functions, and the spectra are extremely accurate 
near the boundary of the spectra (u — 9.8eV), where 
the energy gap is open. Therefore, we conclude from the 
lower panel of Fig.[4]that the ground state of the present 
Hamiltonian is insulator. 

Changing the value of V continuously from 0.5eV to 
O.OeV, we find that the system becomes metal4 There- 
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fore, the intersite Coulomb interaction makes the present 
system insulator, unlike the usual transition metal oxide 
where the large on-site Coulomb interaction makes the 
system insulator. 



V. DISCUSSION AND SUMMARY 

Once COCG method is applied to the reference system 
Eq. ((6J, the shifted system Eq. ([7|) is solved without time 
consuming matrix-vector operations, by shifted COCG 
method. This notable property is due to the mathemat- 
ical structure of COCG method, such that the residual 
vector's are forming the "orthogonal" base set of vectors, 
whose direction does not change against a. This reduc- 
tion of the matrix-vector operation extremely accelerate 
the calculation speed of Green's function G(u>), keeping 
the robustness of COCG method. Simultaneously, the 
accuracy of the approximate G(lu) is easily estimated as 
the norm of the newly generated base vector (residual 
vector) at the latest iteration. The total accuracy of 
the shifted COCG method varies depending on u, a and 
tti ref i£ and generally very small near the bounds of the re- 
spective spectra. In the many-electron Green's function, 
we are usually interested in the low energy excitations, 
in other words, the spectra near the boundary between 
affinity and ionization levels. Therefore, we can calcu- 
late the Green's function accurately and quickly by the 
shifted COCG method, in the interesting energy range. 

Another problem in the application of the COCG 
method to the many-electron theory is a memory con- 
straint due to the extremely large size of vectors and 
matrices. We resolved this problem with separating the 
COCG part for the reference system and the part for 
the shifted equations, changing the loop structure. This 
change give us the following two merits. One is a usage 
of the former part for improving the ground state, as is 
mentioned in the subsection III Dt The other is the fast 
calculation of changing smearing factor r], which is just 
a imaginary shift. When we do not know the proper en- 
ergy scale a priori, the width of the energy gap in the 
present paper or the proper value of 77, this merit is very 
important. 

The seed switching is a very important idea for the 
shifted COCG method to give global convergence, which 
means that the calculated solution converges everywhere 
in the interested energy region. Because it takes much 
iteration steps to converge the solution especially in the 
middle of the spectra, sometimes we must discontinue 
the iteration step before obtaining global convergence. 
In that case, we must examine the accuracy of the result 
and check if the solutions in the required energy range 
satisfy the criterion. 

The applicability of the above reconstruction and the 
seed switching are not specific to the many-electron prob- 
lem. We can apply them to the general solution of the 
Green's function of extremely large dimension. 

We applied the shifted COCG method to the charge 



and spin order in La|SriNi04, where the intersite 
Coulomb interaction, relatively small compared with on- 
site one, plays an important role. Then we conclude the 
relatively small energy gap opens at the Fermi energy, 
and the system becomes insulator, due to the intersite 
Coulomb interaction. 
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APPENDIX A: MATHEMATICAL STRUCTURE 
OF SHIFTED COCG METHODS 

Here we explain the two important points to under- 
stand the mathematical structure of the shifted COCG 
methods 

One is the "orthogonality" of the residual vectors {r^} 
with respect to a non-standard "inner-product" (u, v) = 
u T v, which the theorem of collinear residual is based 
on. On should be noticed that this "orthogonality" is 
different from the well-known "A-orthogonality" of the 
searching directions {pk}- Because {rk}k=o,i,...,n— 1 is a 
base set of IC n -i(A, b), the "orthogonality" is also repre- 
sented as follows 



r„ e JC n (A,b) and r„ _L K. n -i(A, b), (Al) 

where the over line stands for taking conjugate. There- 
fore, the direction of r n is uniquely determined by above 
equation, for arbitrary A, as the 1-dimensional comple- 
mentary space of /C n _i(A, b) within }C n (A, b). Then the 
theorem of the collinear residual Eq. (|16p is derived from 
the invariance of Krylov subspace against a. 

Another point appears in the similarity between the 
two Eqs. (|14p and (|17p. Representing x n and r n as 
X n ^x(A)b e /C„_i(A, b) and R n (A)b € fC n (A,b), respec- 
tively, we can derive the relation between two polynomi- 
als Rn(t) = 1 —tX n -i(t) from Eq. (jHJ). This relation and 
the theorem of the collinear residual lead to the relation 
between the two polynomials 7r£ = R n (—a). 2 This re- 
lation explains the similarity between the two Eqs. (fl4|) 
and (117|) . and the plus sign in front of a in the latter 
equation. 

Thus the mathematical structure of the shifted COCG 
method consists of two structures, that of a vector space 
and that of a set of polynomials. The non-standard "in- 
ner product" in the shifted COCG method can be rec- 
ognized as the conservation of the analytic property as a 
polynomial of a. 
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APPENDIX B: REDUCING STORAGE SIZE OF 
HAMILTONIAN 

In investigating the properties of the many-electron 
Hamiltonian, there is a trade-off between the speed of 
matrix-vector operation and the amount of the mem- 
ory where the matrix elements of the Hamiltonian are 
stored. Assuming the all non-zero matrix elements are 
stored separately and the number of the non-zero matrix 
elements per each column is equal to 20, then about 10 
GB is required to store the Hamiltonian in the case of 
present paper. That is too much for the most of modern 
computers. In stead, if the operation of the Hamiltonian 
on the vectors is implemented as the summation of the 
respective term in Eq. (|23p , then the number of operator 
equals to 640, and, the memory required to store 640 op- 
erators is negligibly small compared to 10GB. However, 
these 640 operators must be applied to the vectors, for 
single operation of the Hamiltonian. Therefore, the cal- 
culation time increases, compared to the case of the all 
elements of the Hamiltonian are stored. 

For simplicity, the Fermion sign and the two parti- 
cle operators in the Hamiltonian are neglected hereafter, 
then, the difference between above two manners of stor- 
ing the Hamiltonian are described as follows, mathemat- 
ically We define V as a whole vector space of the sin- 
gle electron state and decompose it into the direct sum 
V = ® a V a , a denotes orbital or spin or arbitrary combi- 
nations of relevant quantum number. Consequently, the 
whole space where n-body Hamiltonian acts is described 
as 



V 



V — ®ai,...,a n Vai 



(Bl) 



Decomposing single particle operator P as P = 
/3 -Pa/3' PapVp G V a , then the action of P on the 
® n V is decomposed as 



P = (^P Q /3®1 



1) 



(1 ® . . . (g) 1 (g) ^ P a/3 (g) 1 (g . . . <g 1) + . . . 

(l<g...<gl<g^P a/3 ), (B2) 

a, 13 



since the single particle operator P acts on ® n V as 
(P(g)l(g)...(g)l) + ... + (l(g)...(g)l(g)P(g)l(g)...(g) 
1) + . . . + (1 (g . . . ® 1 ® P). This decomposition is 
trivial but one should be noticed that the dimension 
of respective subspace decrease exponentially with n as 
dim(V Ql (g . . . (g V an ) = {—^) n , where m is the number 
of the partition V = ®aYa- ln contrast, the number of 
the terms to be summed up increases in proportion to 
nm 2 Therefore, the total number of the matrix elements 
decrease by this decomposition in Eq. (|B2p . Extreme 
limit of the respective V a consists of only one dimension 
is corresponding to the above mentioned case of 640 op- 
erators. 



In the present example of the spectral function, we 
choose the decomposition V = V\ © Vj, , because total S z 
is preserved, and consequently, the numbers of |- and J,- 
electrons are preserved, and more, the hopping part of 
the present Hamiltonian does not have the cross term 
with respect to spin. This partition of the vector space 
leads to more simplified partition of the operator than 
Eq. p32|) . H = H n <g)l + l<E)H ll + (residual part)^ where 
-Hf! and are including the hopping term with respect 
to each spin and a part of on-site Coulomb interaction. 
As a result, the size of the memory area where the values 
of the matrix elements are stored is about 1GB. Actu- 
ally, an extra 0.5GB is required for storing the indexes 
of the place of non-zero elements, then, totally 1.5GB is 
required for storing the whole Hamiltonian. That is not 
so big for a modern computer. 
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